Text complexity and linguistic features: Their correlation in English and Russian
نویسندگان
چکیده
Text complexity assessment is a challenging task requiring various linguistic aspects to be taken into consideration. The level of the text should correspond reader’s competence. A too complicated could incomprehensible, whereas simple one boring. For many years, features were used assess readability, e.g. average length words and sentences or vocabulary variety. Thanks development natural language processing methods, set parameters for evaluating readability has expanded significantly. In recent articles have been published authors which investigated contribution lexical, morphological, syntactic level. Nevertheless, as methods corpora are quite diverse, it may hard draw general conclusions effectiveness information due diversity corpora. Moreover, cross-lingual impact different on datasets not investigated. purpose this study conduct large-scale comparison nature. We experimentally assessed seven commonly feature types (readability, traditional features, morphological punctuation, syntax frequency, topic modeling) six in English Russian employing four common machine learning models: logistic regression, random forest, convolutional neural network feedforward network. One corpora, corpus fiction literature read by school students, was constructed experiment using survey ensure objectivity labeling. showed can significantly improve performance analyzed their according dataset characteristics, language, data source.
منابع مشابه
individual qualities and integrative motivation and their prediction of non-linguistic outcomes of learning english in intermediate iranian students: a psychological perspective
abstract this study investigated the predictability of variables from a motivational framework as well as individuals qualities to predict three non-linguistic outcomes of language learning. gardners socio-educational model with its measures has been used in the current study. individual qualities presented in this study include (1) age, (2) gender, and (3) language learning experience. the...
eplicitation in interlingual and intralingual translations of shahnameh ferdowsi: a text linguistic approach
بررسی و مقایسه تفاوتها و شباهت های ترجمه ی درون زبانی و برون زبانی با تمرکز بر زبانشناسی متن. برای امر مقایسه میزان بسامد تصریح به کار رفته در ترجمه ی درون زبانی و نیز برون زبانی شاهنامه ی فردوسی مورد بررسی قرار گرفت.
identifying the strategies persian efl learners use in reading an expository text in english and examining its relation to reading-proficiency and motivation: a think-aloud study
هدف اصلی از این مطالعه بررسی نوع و میزان استراتژی هایی بود که دانشجویان فارسی زبان رشته ی زبان انگلیسی در حین خواندن یک متن انگلیسی به کار گرفتند. این مطالعه همچنین به بررسی تفاوت های استراتژی های مورد استفاده بین دارندگان سطح بالا و پایین درک مطلب پرداخت. نوع همبستگی بین استراتژی به کار گرفته و درک مطلب از یک سو و استراتژی به کار گرفته و انگیزه از سوی دیگر نیز در این تحقیق مورد آزمایش قرار گرف...
15 صفحه اولLinguistic complexity: English vs. Polish, text vs. corpus
We analyze the rank-frequency distributions of words in selected English and Polish texts. We show that for the lemmatized (basic) word forms the scale-invariant regime breaks after about two decades, while it might be consistent for the whole range of ranks for the inflected word forms. We also find that for a corpus consisting of texts written by different authors the basic scale-invariant re...
متن کاملSyntactic Complexity of Russian Unified State Exam Texts in English: A Study on Reliability and Validity
In this study we analyze texts used in Russian Unified State Exam on English language. Texts that formed small research corpora were retrieved from 2 resources: official USE database as a reference point, and popular website used by pupils for USE training “Neznaika” (https://neznaika.pro/). The size of two corpora is balanced: USE has 11934 tokens and “Neznaika” - 11918 tokens. We share Biber’...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Russian journal of linguistics
سال: 2022
ISSN: ['2312-9182', '2312-9212']
DOI: https://doi.org/10.22363/2687-0088-30132